Final Project Guidelines

0.1 Your Task

For this project, you are expected to use a two-way ANOVA to investigate the relationship between one numerical response variable and two categorical explanatory variables. You are permitted to use the same dataset as your Midterm Project, so long as there are at least two categorical variables to choose from.

Note

If you would like to analyze a discrete numerical variable (e.g., number of pets) as a categorical variable, you will need to convert that variable into a categorical variable in R as R assumes all variables with numbers should be numerical.

You will need Dr. Theobold’s help to perform this task. Dr. Theobold will work with you to convert your variable as long as you request help before Friday at 4pm.

1 Introduction

1.1 Data Description

In 4-5 sentences describe: * how the data were collected * the context of the data (e.g., are the data from from a published study?) * the background of the research problem (e.g., why were the data collected?)

1.2 Questions of Interest

State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should start with, “What is the relationship between…” and should be as specific as possible. Your Findings section should directly address the question(s) you pose here.

2 Methods

This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.

  • Describe the response and explanatory variables, how they were measured and their associated units. For categorical variables, describe the levels of the categorical variable.

  • Produce data visualizations exploring the relationship(s) you are interested in investigating, contrasting the need for a second explanatory variable. For your project everyone will have three visualizations:

    • a visualization of the relationship between your response variable and explanatory variable 1
    • a visualization of the relationship between your response variable and explanatory variable 2
    • a visualization of the relationship between your response variable and both explanatory variables
Visualizations with categorical variables

In Lab 3 you learned how to make density ridge plots, the most recommended visualization for numerical and categorical variables. For your project you are required to use density ridge plots.

Every visualization should have nicely formatted axis labels!

  • Describe what you see in the visualizations, making direct references to the plots!

  • Outline the statistical model you will use to answer the question(s) of interest that you stated previously.

Not visually selecting a model

Unlike the Midterm Project, you will not be choosing what statistical model to fit based on the visualizations. Everyone will be using a two-way ANOVA interaction model to analyze their data, so in this section you are to describe the statistical model you will use and why.

  • Evaluate the conditions of the statistical model you propose to use
Condition violations

If you find through the study design and / or your visualizations that certain model conditions are violated, you are expected to do your best to remedy these violations.

3 Findings

In this section you will write up your findings for your question of interest.

3.1 Two-Way ANOVA Interaction Model

  • Fit a two-way ANOVA interaction model
  • Obtain the ANOVA table for the model
Tip

Use the tidy() function from the broom package to provide nicely formatted ANOVA table

  • Based on the ANOVA table, state what decision was reached regarding the hypothesis for the two-way ANOVA interaction model.
  • Based on the decision you made, state what you can conclude regarding the relationship between the variables in your model.
Danger

You must state what \(\alpha\) threshold was used when reaching your hypothesis test decisions.

3.2 Two-Way ANOVA Additive Model

Fitting an additive model

If you failed to reject \(H_0\) for your interaction model, your next step is to fit an additive two-way ANOVA model.

  • Fit a two-way ANOVA additive model
  • Obtain the ANOVA table for the model
  • Based on the ANOVA table, state what decision was reached for each hypothesis in the two-way ANOVA additive model.
  • Based on the decision you made, state what you can conclude regarding the relationship between your variables.

3.3 One-Way ANOVA Model

Fitting a one-way ANOVA model

If you failed to reject \(H_0\) for either of your explanatory variables, your next step is to fit a one-way ANOVA model.

Specifically, you should remove the explanatory variable with the largest p-value, even if both variables have p-values larger than your chosen \(\alpha\) level.

  • Fit a one-way ANOVA model
  • Obtain the ANOVA table for the model
  • Based on the ANOVA table, state what decision was reached for the hypothesis in the one-way ANOVA model
  • Based on the decision you made, state what you can conclude regarding the relationship between your variables.

3.4 Mean Only Model

If you failed to reject \(H_0\) for your one-way ANOVA model, your next step is to fit a mean only model (one mean for every observation).

3.5 Conclusions

Based on the results of your analysis what is your conclusion for the questions of interest? Connect your conclusion(s) to the relationships you saw in the visualizations you made.

In this section you should also describe whether you believe the tests you performed are “reliable”. Meaning, did you violate any of the conditions required of a two-way or one-way ANOVA model?

4 Scope of Inference

Write a 4-5 sentence statement on what can be inferred from the design of the study and the results of your statistical analysis. Specifically, answer these two questions and comment on their implications:

  • Based on the sampling method used, what larger population can you infer the results or your analysis onto?
Tip

Your statement needs to include a description of (1) how the data were collected, and (2) the population to whom the results can be applied. You must justify your reasoning for #2 using information from the design of the study.

  • Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables?
Tip

Your statement needs to include a description of (1) how the study was designed, and (2) what statements can be made about the relationship between the variables. You must justify your reasoning for #2 by making direct reference to the variables included in the study. General statements about “this was an observational study” are insufficient.